Non-Deterministic Policies In Markovian Processes
نویسنده
چکیده
Markovian processes have long been used to model stochastic environments. Reinforcement learning has emerged as a framework to solve sequential planning and decision making problems in such environments. In recent years, attempts were made to apply methods from reinforcement learning to construct adaptive treatment strategies, where a sequence of individualized treatments is learned from clinical data. Although these methods have proved to be useful in problems concerning sequential decision making, they cannot be applied in their current form to medical domains, as they lack widely accepted notions of confidence measures. Moreover, policies provided by most methods in reinforcement learning are often highly prescriptive and leave little room for the doctor’s input. Without the ability to provide flexible guidelines and statistical guarantees, it is unlikely that these methods can gain ground within the medical community. This thesis introduces the new concept of non-deterministic policies to capture the user’s decision making process. We use this concept to provide flexible choice to user among near-optimal solutions, and provide statistical guarantees for decisions with uncertainties. We provide two algorithms to propose flexible options to the user, while making sure the performance is always close to optimal. We then show how to provide confidence measures over the value function of Markovian processes, and finally use them to find sets of actions that will almost surly include the optimal one.
منابع مشابه
Non-Deterministic Policies in Markovian Decision Processes
Markovian processes have long been used to model stochastic environments. Reinforcement learning has emerged as a framework to solve sequential planning and decision-making problems in such environments. In recent years, attempts were made to apply methods from reinforcement learning to construct decision support systems for action selection in Markovian environments. Although conventional meth...
متن کاملLearning in a state of confusion : employing active perception and reinforcement learning in partially observable worlds
In applying reinforcement learning to agents acting in the real world we are often faced with tasks that are non-Markovian in nature. Much work has been done using state estimation algorithms to try to uncover Markovian models of tasks in order to allow the learning of optimal solutions using reinforcement learning. Unfortunately these algorithms which attempt to simultaneously learn a Markov m...
متن کاملOn the Ergodicity of Wireless Packet Transmission with Markovian Deterministic Scheduling Policies and Memoryless Traffic Inputs
متن کامل
Dynamic Pricing and Inventory Control: the Value of Demand Learning
This paper studies various approaches to demand learning in the context of a one-shot inventory replenishment problem with dynamic pricing. The customer arrival process is assumed to be piecewise deterministic and Markovian with an unknown parameter. Homogeneous customers have an iso-elastic demand function and do not behave strategically. We study full information, non-learning, passive learni...
متن کاملStationarity, Time–reversal and Fluctuation Theory for a Class of Piecewise Deterministic Markov Processes
We consider a class of stochastic dynamical systems, called piecewise deterministic Markov processes, with states (x, σ) ∈ Ω × Γ, Ω being a region in R or the d–dimensional torus, Γ being a finite set. The continuous variable x follows a piecewise deterministic dynamics, the discrete variable σ evolves by a stochastic jump dynamics and the two resulting evolutions are fully–coupled. We study st...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009